Towards Robust Speech Acquisition using Sensor Arrays
نویسندگان
چکیده
An integrated system approach was developed to address the problem of distant speech acquisition in multi-party meetings, using multiple microphones and cameras. Microphone array processing techniques have presented a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering and directional discrimination. These techniques relied on accurate speaker locations for optimal performance. Tracking accurate speaker locations, solely based on audio were not successful due to the discreteness and vulnerability to noise sources and reverberation. Multi-modal approaches, by using audio-visual sensors provided the required accurate speaker locations. Robust and accurate speaker locations were achieved by utilizing the complementary advantages provided by the respective modalities. In the proposed approach, an audio-visual multi-person tracker was used to track active speakers continuously with high accuracy. The speech processing system provided microphone array based speech enhancement and automatic speech/non-speech segmentation to serve as input for the speech recognition. The approach was evaluated on the data recorded in a real meeting room for stationary speaker, moving speaker and overlapping speech scenarios. The results revealed that the speech enhancement and recognition performance, achieved by tracking active speaker, followed by microphone array processing were significantly better than those of single table-top microphone and comparable to those of lapel microphone for all three studied scenarios. Overall, the envisaged integrated system was shown to be an appropriate means for robust distant speech acquisition.
منابع مشابه
Speech Recognition Using Ad-hoc Microphone Arrays
While close talking microphones give the best signal quality and produce the highest accuracy from current Automatic Speech Recognition (ASR) systems, the speech signal enhanced by microphone array has been shown to be an effective alternative in a noisy environment. The use of microphone arrays in contrast to close talking microphones alleviates the feeling of discomfort and distraction to the...
متن کاملPlanar superdirective microphone arrays for speech acquisition in the car
In this paper we investigate a small broadside planar (2D) superdirective microphone array for speech acquisition in the car and compare its performance to linear arrays. The objective of this investigation is to replace an expensive directional microphone by a small array of inexpensive omnidirectional sensors. Since the array was designed to be used in the car environment it has to satisfy re...
متن کاملTheory and design of broadband sensor arrays with frequency invariant far eld beam patterns
The theory and design of a broadband array of sensors with a frequency invariant far eld beam pattern over an arbitrarily wide design bandwidth is presented The frequency invariant beam pattern property is de ned in terms of a continuously distributed sensor and the problem of designing a practical sensor array is then treated as an approximation to this continuous sensor using a discrete set o...
متن کاملTheory and Design of Broadband Sensor Arrays with Frequency Invariant Far-field Beam Patterns
The theory and design of a broadband array of sensors with a frequency invariant far-field beam pattern over an arbitrarily wide design bandwidth is presented. The frequency invariant beam pattern property is defined in terms of a continuously distributed sensor, and the problem of designing a practical sensor array is then treated as an approximation to this continuous ensor using a discrete s...
متن کاملBeamforming for a source located in the interior of a sensor array
We introduce a framework for acquiring a signal from a source that is located within the midst of a randomly distributed sensor array. This problem arises in speech acquisition with microphone arrays. Based on psychoacoustic considerations, we formulate a constrained optimization problem in which the array weights are chosen to minimize the response to farfield sources while maintaining a unity...
متن کامل